Podcastle: Improvements of Speech Recognition by Using Acoustic Modeling Based on Wisdom of Crowds

نویسندگان

  • Jun Ogata
  • Masataka Goto
چکیده

1 はじめに 我々は,ポッドキャストを音声認識によって 自動的にテキスト化することで,それらをユー ザが全文検索できるだけではなく,詳細な閲覧, 編集も可能なソーシャルアノテーションシステ ム「PodCastle1)2)3)」の開発,運営を行っている. ポッドキャストは実環境の多様な音声データであ り,従来の音声認識技術では高い認識率を達成す ることは難しい.そこで PodCastleでは,多数 のユーザに認識誤りを訂正 (アノテーション)す る協力をしてもらうことで,音声認識率をシス テムの運用中に向上させる枠組みを採用してい る.こうすることで,検索サービスとしての質を 向上させるだけでなく,音声認識技術の底上げを はかることも狙っている. 本研究では,上記の枠組みの一環として,PodCastleを通じて得られる集合知,すなわちユー ザによる音声認識誤りの訂正結果を活用した音 響モデル学習について検討し,ポッドキャスト音 声認識において評価を行う.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Podcastle: collaborative training of acoustic models on the basis of wisdom of crowds for podcast transcription

This paper presents acoustic-model-training techniques for improving automatic transcription of podcasts. A typical approach for acoustic modeling is to create a task-specific corpus including hundreds (or even thousands) of hours of speech data and their accurate transcriptions. This approach, however, is impractical in podcast-transcription task because manual generation of the transcriptions...

متن کامل

PodCastle: Collaborative Training of Language Models on the Basis of Wisdom of Crowds

This paper presents a language-model training method for improving automatic transcription of online spoken contents. Unlike previously studied LVCSR tasks such as broadcast news and lectures, large-sized task-specific corpora for training language models cannot be prepared and used in recognition because of the diversity of topics, vocabularies, and speaking styles. To overcome difficulties in...

متن کامل

Allophone-based acoustic modeling for Persian phoneme recognition

Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects o...

متن کامل

Persian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods

Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...

متن کامل

شبکه عصبی پیچشی با پنجره‌های قابل تطبیق برای بازشناسی گفتار

Although, speech recognition systems are widely used and their accuracies are continuously increased, there is a considerable performance gap between their accuracies and human recognition ability. This is partially due to high speaker variations in speech signal. Deep neural networks are among the best tools for acoustic modeling. Recently, using hybrid deep neural network and hidden Markov mo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009